Do not suppress candidate tokens in the middle of a rule#15085
Do not suppress candidate tokens in the middle of a rule#15085martint merged 1 commit intotrinodb:masterfrom
Conversation
There was a problem hiding this comment.
This doesn't seem right. The error position is reported as the end of UNBOUNDED:
SELECT X() OVER (ROWS UNBOUNDED) FROM T
^
The only valid tokens at that point are FOLLOWING and PRECEDING, so we shouldn't be reporting all of these extra tokens.
There was a problem hiding this comment.
Yes, I know. I’m trying to figure out why it’s doing that. There’s some other bug lurking in there.
There was a problem hiding this comment.
It turns out these are all valid tokens per the grammar. It's because UNBOUNDED is a non-reserved word, so it can take the place of an identifier. In particular, it's considering alternatives that include the following productions:
over
: OVER ([...] | '(' windowSpecification ')')
;
windowSpecification
: [...]
windowFrame?
;
windowFrame
: [...]
frameExtent
;
frameExtent
: [...]
| frameType=ROWS start=frameBound
| [...]
;
frameBound
: [...]
| expression boundType=(PRECEDING | FOLLOWING) #boundedFrame
;
And <expression> can be an <identifier> followed by a number of things, including ., [, OVER, etc.
For instance, this goes through the parser successfully even though it's clearly wrong if we consider UNBOUNDED a special keyword:
SELECT X() OVER (ROWS UNBOUNDED -> 1 PRECEDING) FROM t
It's treating UNBOUNDED -> 1 as a lambda expression, which is a valid (syntax-wise) expression in the "#boundedFrame" frame bound above.
There was a problem hiding this comment.
If I mark UNBOUNDED as a reserved word, then it produces the expected message:
line 1:32: mismatched input ')'. Expecting: 'FOLLOWING', 'PRECEDING'
There was a problem hiding this comment.
Is there a way to make the error handler treat UNBOUNDED as a reserved word without actually making it reserved? In my opinion, printing all alternatives for the case of UNBOUNDED being an identifier is unnecessary and confusing, even if those are all correct per grammar.
There was a problem hiding this comment.
Another issue I can spot here is that the error handler suggests GROUPS, RANGE, and ROWS at position 23:
Arguments.of("SELECT x() over (ROWS select) FROM t",
"line 1:23: mismatched input 'select'. Expecting: 'BETWEEN', 'CURRENT', 'UNBOUNDED', <expression>"),
"line 1:23: mismatched input 'select'. Expecting: ')', 'BETWEEN', 'CURRENT', 'GROUPS', 'MEASURES', 'ORDER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', <expression>"),
It is not correct, as position 23 is after the word ROWS. It seems that the reported error position is the innermost one, but the suggestions come from an outer rule.
There was a problem hiding this comment.
It is correct (grammar-wise) because ROWS can be interpreted as an identifier representing a named window. After that, any of the terms PARTITION, ORDER, or any of the prefix tokens in windowFrame are valid.
At the end of the day, this fixes a bug in the parser error handler that should've never existed in the initial implementation, so we should get it in. I can't think of any way to avoid the spurious tokens due to the non-reserved keyword handling without adding semantic knowledge to the grammar about the behavior of each rule, but I have some ideas that are out of the scope of this PR.
5e1e180 to
0dffe76
Compare
There was a problem hiding this comment.
Do you think we can remove ignoredRule from ErrorHandler now?
AFAIK, nonReserved was the only ignored rule.
0dffe76 to
2cc1690
Compare
2cc1690 to
335dfbe
Compare
335dfbe to
016af5b
Compare
When the parsing failure occurs in the middle of a rule (as opposed to at the start of a rule), do not suppress the candidate tokens even if the rule is marked as suppressed. For instance, if the parsing fails immediately when recursing into primaryExpression, the candidates from that rule will be suppressed to avoid polluting the error message with all the possible prefixes of that rule (EXISTS, CASE, TRY_CAST, CURRENT_XXX, etc). However, if the parsing fails after <identifier> is matched within the <identifier> <string> alternative, include <string> in set of expected candidates.
016af5b to
0e6e498
Compare
When the parsing failure occurs in the middle of a rule (as opposed to at the start of a rule), do not suppress the candidate tokens even if the rule is marked as suppressed.
For instance, if the parsing fails immediately when recursing into primaryExpression, the candidates from that rule will be suppressed to avoid polluting the error message with all the possible prefixes of that rule (EXISTS, CASE, TRY_CAST, CURRENT_XXX, etc).
However, if the parsing fails after is matched within the alternative, include in set of expected candidates.
Fixes #15044
Release notes
(x) This is not user-visible or docs only and no release notes are required.